Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Chinese text distinction and font identification by recognizing most frequently used characters

Identifieur interne : 001B57 ( Main/Exploration ); précédent : 001B56; suivant : 001B58

Chinese text distinction and font identification by recognizing most frequently used characters

Auteurs : Chi-Fang Lin [Taïwan, République populaire de Chine] ; Yu-Fan Fang [République populaire de Chine] ; Yau-Tarng Juang [République populaire de Chine]

Source :

RBID : ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3

English descriptors

Abstract

In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.

Url:
DOI: 10.1016/S0262-8856(00)00082-2


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
<author>
<name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
</author>
<author>
<name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
</author>
<author>
<name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0262-8856(00)00082-2</idno>
<idno type="url">https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000057</idno>
<idno type="wicri:Area/Istex/Curation">000056</idno>
<idno type="wicri:Area/Istex/Checkpoint">001211</idno>
<idno type="wicri:doubleKey">0262-8856:2001:Lin C:chinese:text:distinction</idno>
<idno type="wicri:Area/Main/Merge">001C50</idno>
<idno type="wicri:Area/Main/Curation">001B57</idno>
<idno type="wicri:Area/Main/Exploration">001B57</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
<author>
<name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
<affiliation wicri:level="1">
<country wicri:rule="url">Taïwan</country>
</affiliation>
<affiliation wicri:level="1">
<country xml:lang="fr" wicri:curation="lc">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Engineering and Science, Yuan-Ze University, Chung-Li 320, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
<affiliation wicri:level="1">
<country xml:lang="fr" wicri:curation="lc">République populaire de Chine</country>
<wicri:regionArea>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
<affiliation wicri:level="1">
<country xml:lang="fr" wicri:curation="lc">République populaire de Chine</country>
<wicri:regionArea>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Image and Vision Computing</title>
<title level="j" type="abbrev">IMAVIS</title>
<idno type="ISSN">0262-8856</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">19</biblScope>
<biblScope unit="issue">6</biblScope>
<biblScope unit="page" from="329">329</biblScope>
<biblScope unit="page" to="338">338</biblScope>
</imprint>
<idno type="ISSN">0262-8856</idno>
</series>
<idno type="istex">4A8175B424D8D0E33BD442A591B43A5C1A0428A3</idno>
<idno type="DOI">10.1016/S0262-8856(00)00082-2</idno>
<idno type="PII">S0262-8856(00)00082-2</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0262-8856</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Feature extraction</term>
<term>Font identification</term>
<term>Template matching</term>
<term>Text distinction</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
<li>Taïwan</li>
</country>
</list>
<tree>
<country name="Taïwan">
<noRegion>
<name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
</noRegion>
</country>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
</noRegion>
<name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
<name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001B57 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001B57 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3
   |texte=   Chinese text distinction and font identification by recognizing most frequently used characters
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024